Linux Cubed Series 7: Sunsite

home *** CD-ROM | disk | FTP | other *** search

/ Linux Cubed Series 7: Sunsite / Linux Cubed Series 7 - Sunsite Vol 1.iso / system / admin / linuxcon.000 / linuxcon / linuxconf-1.6 / translate / translat.sgml < prev next >

Wrap

SGML Document | 1996-04-14 | 18KB

<!doctype linuxdoc system> <article> <title>Translation system for Linuxconf <author>Introduction <abstract> Linuxconf is a large software component, full of menus, and dialogs. To be easily translatable, all messages must be extracted from the C++ source code and place into dictionaries which can be translated efficiently. A special set of tools has been designed to achieve this. They are described here. </abstract> <sect>Introduction This document describes both how the system works and how translators can use it. It starts by explaining how programmers can use it to produce translatable programs. The section "how to translate" explains how translators can use this system to translate Linuxconf or any programs written using this system. <sect>Principles To make programs easily translatable, all messages should be placed in dictionaries. A dictionary is made of message entries. Each message has a unique ID and a value. In the C++ source, programmers are referring to those messages using the ID whenever they want to print or say something. Each time a programmer need a new message, he has to add it in the message dictionary and reference it from the C++ source code. This is how most system works (There are other translation system out there). The system used by <em/Linuxconf/ is basically different. Messages are defined in the <tt/C++/ source code and the dictionaries are built by scanning all <tt/C++/ source files. Messages are defined in the <tt/C++/ code. Programmers must provide and ID and a value for each message right in the source code. This is much easier (or nicer) to do this right in the source code than to go back and forth in the dictionary. Furthermore, the programmer directly see the message definition in the source. With other system, only the message ID is visible in the source. Using the magic of the <tt/C/ preprocessor, the message value is not compiled in the object code at all. Seen this way, the translation system used by <em/Linuxconf/ yield the same result as other system. It is just nicer to use for programmers. Lets describe how a programmer use the system. <sect1>One dictionary per source directory It is best to define one message dictionary per sub-project or sub-directory. This is easier to manage and avoid ID name space congestion. For each directory source of <em/Linuxconf/ you have one "<tt/dic/" file and one "<tt/m/" file. Both file are produced simply by doing <tscreen><verb> make msg </verb></tscreen> This command scans all <tt/C++/ source file of the current directory and update the file <tt>../messages/sources/DIRECTORY.dic</tt> and the file <tt/DIRECTORY.m/, where <tt/DIRECTORY/ is the name of the current directory. <tt/make msg/ use the <tt>../translate/msgscan</tt> utility to scan the source. This utility looks for specific constructs in the <tt/C++/ source file. Here they are. <sect1>The <tt/MSG_U/ macro The <tt/MSG_U/ macro defines a new message. It defines both its ID and its value. This macro is usable anywhere a <tt/C++/ string would be. <tscreen><verb> #include "prjfoo.m" int foo() { printf (MSG_U(M_MSG1,"Entering function foo")); } </verb></tscreen> MSG_U defines a single value. <tt/U/ stands for unilingual. It only defines one value. <sect1>The <tt/MSG_B/ macro The <tt/MSG_B/ macro is like the <tt/MSG_U/ macro, except it defines two values, allowing a programmer to code immediately two languages at once. The <tt/B/ stands for bilingual. This has not been used in the <em/Linuxconf/ project but has proven effective for other projects. <tscreen><verb> #include "prjfoo.m" int foo() { printf (MSG_U(M_MSG1 ,"Entering function foo\n")); ,"DΘmarrage de la fonction foo\n")); } </verb></tscreen> <sect1>The <tt/MSG_R/ macro The <tt/MSG_R/ macro simply references an already defined message. This message may have been defined in another source file (of the same project). Like the other macros, <tt/MSG_R/ may be used anywhere a <tt/C++/ string is. <sect1>The <tt/MSG_VERSION/ macro This macro has not been used so far. It would allow one programmer to raise the version number of a dictionary, preventing older application to use the newer potentially incompatible dictionary. The msgclean utility also plays with the version number of the dictionary. The <tt/MSG_VERSION/ macro is still a concept rather than a useful addition. Stay tune... <sect1>The magic of the <tt/MSG_/ macros The <tt/MSG_/ macros perform two tasks. First, they are easily spotted by the <tt/msgscan/ utility. The parsing is simple and reliable even if the <tt/C++/ source code is not functional. Second, they hide the retrieval mechanism (How the message value is retrieved from the binary dictionary at runtime). The msgscan utility produce the <tt/.m/ file which looks like this for the simple example above. <tscreen><verb> FILE prjfoo.m: extern const char **_dictionary_prjfoo; #ifndef DICTIONNARY_REQUEST #define DICTIONNARY_REQUEST \ const char **_dictionary_prjfoo;\ TRANSLATE_SYSTEM_REQ _dictionary_req_prjfoo\ ("prjfoo",_dictionary_prjfoo,55,1);\ void dummy_dict_prjfoo(){} #endif #ifndef MSG_U #define MSG_U(id,m) id #define MSG_B(id,m,n) id #define MSG_R(id) id #endif #define M_MSG1 _dictionary_prjfoo[0] </verb></tscreen> As you see, one global variable is created: <tt/_dictionary_prjfoo/. A special macro <tt/DICTIONARY_REQUEST/ is defined. This macro should be placed in one source of the project. It is generally place in the file <tt/_dict.c/ presented later. <sect>How to use it To produce a translatable program, do the following <itemize> <item>Replace all string message with <tt/MSG_U/ or <tt/MSG_B/ macros, giving each message a unique <tt/ID/. <item>include (#include) the <tt/.m/ file in each source file using the <tt/MSG_x/ macros. This file is generally named <tt/directory.m/ where directory is the name of the current directory. <item>Create a file <tt/_dict.c/. The content of this file is shown below. <item>Use "<tt/make msg/" to extract the messages. This produces/updates the dictionary file <tt/directory.dic/ and produces the include file <tt/directory.m/. <item>Compile and link your program. <item>Use "<tt/make msg.eng/" to produce the English binary dictionary. The file produced should be placed where your program expects it. </itemize> We will now describe further the different steps involved. <sect1>The <tt/make msg/ command and <tt/msgscan/ utility The <tt/make msg/ command invokes the <tt/msgscan/ utility. This utility scan a set of <tt/C/ or <tt/C++/ source file, updates a dictionary file and produces one include file. Here is the command use to update the dictionary of the sub-project <tt/uucp/ of the <em/Linuxconf/ project. <tscreen><verb> ../translate/msgscan uucp \ ../messages/sources/uucp.dic uucp.m EF *.c </verb></tscreen> The first argument is the name of the dictionary. The second argument is the path of the dictionary file. As you see, dictionary file are kept in a single directory for all projects. They are seldom. This eases the works of translators. The third argument is the path of the include file, which is produced in the current directory. The fourth argument is the letter tags used to identify messages defined with the macro <tt/MSG_U/ and <tt/MSG_B/. Messages defined with <tt/MSG_U/ will be tagged with the letter E (English) and messages defined with <tt/MSG_B/ will be tagged with <tt/E/ for the first value and <tt/F/ (French) for the second. <sect1>The <tt/_dict.c/ file It is good practice to place the DICTIONARY_REQUEST macro in a file _dict.c. There is generally one such a file per directory. Its contents is generally: <tscreen><verb> #include "this_directory.m" #include <translat.h> DICTIONARY_REQUEST </verb></tscreen> At least this dependency should be placed in your <tt/makefile/ <tscreen><verb> _dict.o: _dict.c this_directory.m </verb></tscreen> This will ensure that each time you update your dictionary (and the <tt/m/ header file), <tt/_dict.c/ will be recompile, ensuring proper recording of the dictionary revision and number of message. This will avoid executing a program with an obsolete or incompatible binary dictionary. Given that <tt/_dict.c/ is small, the compilation is pretty short. <sect1>The <tt/msgcomp/ utility Once you have compiled and linked your program, you must "compiled" all the dictionaries used in your program into one binary dictionary. This is done by the <tt/msgcomp/ utility. Here is the command used when doing "<tt/make msg.eng/" for the <em/Linuxconf/ project. This produces the English binary dictionary. <tscreen><verb> ../translate/msgcomp -p../messages/sources/ \ /tmp/linuxconf-msg-1.3.eng eE \ askrunlevel dialog dnsconf fstab \ misc main netconf mailconf uucp userconf </verb></tscreen> This commands take all dictionaries for sub-projects <tt/askrunlevel dialog dnsconf fstab misc main netconf mailconf uucp/ and <tt/userconf/ and produce a single binary dictionary. The <tt/-p/ option tells msgcomp to look for those dic files ( askrunlevel.dic dialog.dic ...) in the directory <tt>../messages/sources/</tt>. The argument <tt>/tmp/linuxconf-msg-1.3.eng</tt> is the file to produce. The argument <tt/eE/ instructs <tt/msgcomp/ to extract message's values with the '<tt/e/' tag. If there is no such value for a given message, the value with the '<tt/E/' tag will be used. <sect2>Convention used for tags Dictionary file contain the definition for all messages. Each messages may have different values, identified by a tag letter. When messages are extracted by msgscan, it is instructed to associate values with given tags. By convention, we use upper case letter to identify message's value extracted from the source code. Lower case value are used by translators. We assume here that programmers are bad writers. We let them give their best shots for messages and we are allowed to override their work without overwriting it. By giving precedence to '<tt/e/' tags over '<tt/E/' we are saying that translators work override the work of programmers, but we are not forcing the translators to rewrite everything. <sect1>The <tt/msgclean/ utility The <tt/msgscan/ utility maintains dictionary. At some point some messages may become obsolete (Unused in any source files). The <tt/msgclean/ is used to clean messages without values in the <tt/dic/ file. For the <em/Linuxconf/ project, the <tt/make/ target <tt/msg.clean/ is defined for that purpose. Be aware that applying msgclean on a dictionary file with obsolete message has an important side effect. Some message being deleted, the numbering of all following message will be changed. All source using the <tt/m/ include file should be recompiled. To avoid problems, the <tt/msgclean/ utility automaticly increases the revision number of the dictionary. This prevents using a dictionary with an incompatible program. <sect>Usage restriction The strategy used is mainly targeted at <tt/C++/ code. With some restriction, it may be used for <tt/C/ code. Here are the main feature that probably don't work with <tt/C/. <descrip> <tag/static initialization/ In <tt/C++/ one can write the following code. <tscreen><verb> static char *tb[]={ foo(1),foo(22) }; </verb></tscreen> where foo is a function. The <tt/C++/ compiler will generate the proper code which will be probably called once. The <tt/MSG_U/ macro (and others) are not hiding function call, but are indeed dynamic in some sens. <tt/C/ does not support this. Other translation strategy based on dictionary do have the same limitation though. </descrip> The example using the <tt/static char *tb[]/ is also causing a problem in <tt/C++/ if the variable is declared outside of a function. The problem appear because the "hidden" initialization code generated by the compiler is called very early, often before <tt/main()/ is called. Normally, the function <tt/translat_load()/ which bring the dictionary in memory is called by <tt/main()/. Fortunately, the current implementation, where <tt/_dictionary_system/ is a pointer will trigger a <tt/seg fault/ whenever this condition is met. This fault will be trigger all the time, because all initialization are called before main. The strategy is <em/safe/. <sect>Recommend usage and convention <sect1>Naming convention for message's ID To help peoples who will translate your <em/Linuxconf/, I have used a convention for the ID's name. <descrip> <tag/B_/ Buttons. <tag/E_/ Error message start with this. <tag/F_/ Field labels start with this. <tag/I_/ Dialog introduction start with this. <tag/M_/ All menu entries start with this prefix. <tag/N_/ Notices and warning start with this. <tag/P_/ When the user is prompted for a password, the message's ID start with this. <tag/Q_/ Identify a question (Generally a Yes/No prompt). <tag/T_/ Dialog's title start with this. <tag/X_/ All other messages which fit in no category. </descrip> <sect>How to translate <sect1>Go simple One way to translate is to go right in the <tt/.dic/ files and add translations for each message using a different tag. Then use the <tt/msgcomp/ utility to extract the proper definition. At first, there is little problem doing this. The <tt/msgscan/ utility read,update and save the <tt/.dic/ file, so your changes won't be lost. The problem come from the way software is developed. First we develop and then, when it is stable, we translate. Doing so mean that we have to walk all the <tt/.dic/ files to make sure our translation still fit with the original messages (English version for example). Those original messages may have changed. A different scheme was chosen for <em/Linuxconf/. <sect1>Organization of the <tt/messages/ directory The <tt/messages/ directory contain one subdirectory per language plus one <tt/sources/ directory. This directory contains all the <tt/.dic/ files produced by scanning the <tt/C++/ source files. These file are never hand edited. Each other directory has a copy of those <tt/.dic/ files with the proper translation. A special utility <tt/msgupd/ has been created: it basicly compared all messages in the <tt/sources/ directory with messages in the translated directory. It compare only one language (say the English version). Mostly, <tt/msgupd/ will tell you <itemize> <item>Which messages are new. <item>Which messages have changed (The English wording). </itemize> Using that information, you know exactly what you have to do to keep your work in sync with the current release of <em/Linuxconf/. <tt/msgupd/ will reorder the translated <tt/.dic/ file (Not the one in the <tt/sources/ directory) so all messages which needed work are at the beginning of the file. It also add a comment (<tt/.dic/ files may have comments like most normal <em/Unix/ configuration file) explaining what have to be done. If the English version of the message was changed, it will re tag the version in the translated file and add the new version, plus a comment. The old English message will have the tag "<tt/Z/". You can see easily what is the change. <sect1>The <tt/msgupd/ utility The file <tt/rules.mak/ shows the rules for one translation (which is not done yet). Look for the target <tt/msg.cfr/ and <tt/upd.cfr/. To add a new language, do this <itemize> <item>Create a new directory empty in the <tt/messages/ directory, for example, <tt/mar/ for <em/Alien language/. <item>Customize <tt/rules.mak/ and add the target <tt/msg.mar/ and <tt/upd.mar/. <item>Run the following command. This will fill the <tt>messages/mar</tt> directory with all the necessary <tt/.dic/ files. <tscreen><verb> make upd.mar </verb></tscreen> <item>Go into <tt>messages/mar</tt> and edit each <tt/.dic/ file and add the proper translation as needed. <item>Run the following command to produce the binary dictionary required to run <em/Linuxconf/. <tscreen><verb> make msg.mar </verb></tscreen> <item>Set the following environment variable and run <em/Linuxconf/. <itemize> <item>export LINUXCONF_LANG=mar <item>export LINUXCONF_DICT=/tmp This variable is optional. <em/Linuxconf/ will normally look for its message dictionary in <tt>/usr/lib/linuxconf</tt>. This variable override this. The <tt/msg.*/ makefile's target generally produce their output in /tmp. This is useful to test new messages without breaking the current installation of <em/Linuxconf/. Be aware that this mechanism only work if you execute <em/Linuxconf/ as root. For security reason, a normal user can't override the message dictionary of <em/Linuxconf/ (Although he can select a different language from <tt>/usr/lib/linuxconf</tt> if available). </itemize> </itemize> <sect1>The <tt/msgcomp/ utility The msgcomp utility has been tweaked to support the distribute directory concept. Mainly it use the <tt/.dic/ file in the <tt/sources/ directory as a reference. Message number ID are defined from this file. It then used (optionally) alternative <tt/.dic/ file to grab extra translations. The ordering of the <tt/.dic/ file is unimportant. <sect>Licensing The <em/translate/ directory is part of the <em/Linuxconf/ project but carry a special license. There is no restriction on usage. Feel free to incorporate this system to any project. This simple license does not apply to the rest of <em/Linuxconf/ which is covered by the standard GNU Copyleft license. See the file <tt/LICENSE/ in the root directory. If you find it useful for other project, send me a note and some comments if possible. </article>